由于其不断增加的资源需求,在低资源边缘设备上部署深层神经网络是具有挑战性的。最近的研究提出了无倍数的神经网络,以减少计算和记忆消耗。 Shift神经网络是这些减少的最有效工具之一。但是,现有的低位换档网络不如其完整的精度对应物准确,并且由于其固有的设计缺陷,无法有效地转移到广泛的任务中。我们提出了利用以下新颖设计的光泽网络。首先,我们证明低位移位网络中的零重量值既不有用,也不简化模型推断。因此,我们建议使用零移动机制来简化推理,同时增加模型容量。其次,我们设计了一个新的指标,以测量训练低位移位网络中的重量冻结问题,并提出一个符号尺度分解以提高训练效率。第三,我们提出了低变化的随机初始化策略,以提高模型在转移学习方案中的性能。我们对各种计算机视觉和语音任务进行了广泛的实验。实验结果表明,光泽网络明显胜过现有的低位乘法网络,并可以实现全精度对应物的竞争性能。它还表现出强大的转移学习表现,没有准确性下降。
translated by 谷歌翻译
深度神经网络(DNN)在多个领域取得了令人印象深刻的成功。多年来,随着更深层次,更复杂的体系结构的扩散,这些模型的准确性已经提高。因此,最新的解决方案通常在计算上很昂贵,这使得它们不适合在边缘计算平台上部署。为了减轻推断卷积神经网络(CNN)的高计算,内存和功率要求,我们提出了两次量化量化的使用,该量化量量化量的功率量化将连续参数量化为低点的两个值值。这通过删除昂贵的乘法操作和使用低位权重来降低计算复杂性。 Resnet被用作解决方案的基础,并根据口语理解(SLU)任务评估了建议的模型。实验结果表明,在测试集中,我们的低位量化实现了换档神经网络体系结构的性能,其低位量化达到了98.76 \%,这与其完整精确的对应物和最先进的解决方案相当。
translated by 谷歌翻译
translated by 谷歌翻译
现场机器人收获是农业产业近期发展的有希望的技术。在自然果园收获之前,机器人识别和本地化水果至关重要。然而,果园中收获机器人的工作空间很复杂:许多水果被分支和叶子堵塞。在执行操纵之前,估计每个果实的适当抓握姿势是很重要的。在本研究中,建议使用来自RGB-D相机的颜色和几何感官数据来执行端到端实例分段和掌握估计的几何意识网络A3N。此外,应用了工作区几何建模以帮助机器人操纵。此外,我们实施全球到本地扫描策略,它使机器人能够在具有两个消费级RGB-D相机中准确地识别和检索现场环境中的水果。我们还全面评估了所提出的网络的准确性和鲁棒性。实验结果表明,A3N达到了0.873的实例分割精度,平均计算时间为35毫秒。掌握估计的平均准确性分别为0.61厘米,4.8美元,中心和方向分别为4.8美元。总的来说,利用全球到局部扫描和A3N的机器人系统实现了从现场收集实验中的70 \%-85 \%的收获量的成功率。
translated by 谷歌翻译
特定于语言的预训练模型已被证明比单语说在单语法评估设置中更准确,阿拉伯语也不例外。但是,我们发现先前发布的阿拉伯伯特模型显着培训。在这本技术报告中,我们展示了Jaber,Junior Arabic Bert,我们的预用语言模型原型专用于阿拉伯语。我们进行实证研究,以系统地评估模型在各种现有阿拉伯语NLU任务中的性能。实验结果表明,Jaber实现了Alue的最先进的表演,这是阿拉伯语了解评估的新基准,以及成熟的内部基准
translated by 谷歌翻译
移动网络流量预测是日常网络操作中的关键功能之一。商业移动网络大,异质,复杂,动态。这些内在特征使得移动网络流量预测远离诸如最近的高级算法,例如基于Graph卷积网络的预测方法和各种关注机制,也已经证明是在车辆交通预测中成功的。在本文中,我们将问题作为空间序列预测任务。我们提出了一种新的深度学习网络架构,自适应多接收领域空间 - 时间图卷积网络(AMF-STGCN),以模拟移动基站的交通动态。 AMF-STGCN扩展了GCN(1)在移动网络中联合建模的复杂空间 - 时间依赖性,(2)应用注意机制捕获异构基站的各种接收领域,(3)基于完全连接的额外解码器引入额外的解码器深网络以多阶段预测征服错误传播挑战。来自两个不同域的四个真实数据集的实验一致地显示AMF-STGCN优于最先进的方法。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译
New architecture GPUs like A100 are now equipped with multi-instance GPU (MIG) technology, which allows the GPU to be partitioned into multiple small, isolated instances. This technology provides more flexibility for users to support both deep learning training and inference workloads, but efficiently utilizing it can still be challenging. The vision of this paper is to provide a more comprehensive and practical benchmark study for MIG in order to eliminate the need for tedious manual benchmarking and tuning efforts. To achieve this vision, the paper presents MIGPerf, an open-source tool that streamlines the benchmark study for MIG. Using MIGPerf, the authors conduct a series of experiments, including deep learning training and inference characterization on MIG, GPU sharing characterization, and framework compatibility with MIG. The results of these experiments provide new insights and guidance for users to effectively employ MIG, and lay the foundation for further research on the orchestration of hybrid training and inference workloads on MIGs. The code and results are released on https://github.com/MLSysOps/MIGProfiler. This work is still in progress and more results will be published soon.
translated by 谷歌翻译
Inferring missing links or detecting spurious ones based on observed graphs, known as link prediction, is a long-standing challenge in graph data analysis. With the recent advances in deep learning, graph neural networks have been used for link prediction and have achieved state-of-the-art performance. Nevertheless, existing methods developed for this purpose are typically discriminative, computing features of local subgraphs around two neighboring nodes and predicting potential links between them from the perspective of subgraph classification. In this formalism, the selection of enclosing subgraphs and heuristic structural features for subgraph classification significantly affects the performance of the methods. To overcome this limitation, this paper proposes a novel and radically different link prediction algorithm based on the network reconstruction theory, called GraphLP. Instead of sampling positive and negative links and heuristically computing the features of their enclosing subgraphs, GraphLP utilizes the feature learning ability of deep-learning models to automatically extract the structural patterns of graphs for link prediction under the assumption that real-world graphs are not locally isolated. Moreover, GraphLP explores high-order connectivity patterns to utilize the hierarchical organizational structures of graphs for link prediction. Our experimental results on all common benchmark datasets from different applications demonstrate that the proposed method consistently outperforms other state-of-the-art methods. Unlike the discriminative neural network models used for link prediction, GraphLP is generative, which provides a new paradigm for neural-network-based link prediction.
translated by 谷歌翻译